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Bayesian Model Calibration is used to revisit the problem of scaling factor calibration 
for semi-empirical correction of ab initio harmonic properties (e.g. vibrational fre- 
quencies and zero-point energies). A particular attention is devoted to the evaluation 
of scaling factor uncertainty, and to its effect on the accuracy of scaled properties. 
We argue that in most cases of interest the standard calibration model is not statis- 
tically valid, in the sense that it is not able to fit a set of experimental calibration 
data within their uncertainty limits. This impairs any attempt to use the results 
of the standard model for uncertainty analysis and/or uncertainty propagation. We 
propose to include a stochastic term in the calibration model to account for model 
inadequacy. This new model is validated in the Bayesian Model Calibration frame- 
work. We provide explicit formulae for prediction uncertainty in typical limit cases: 
large and small calibration sets of data with negligible measurement uncertainty, and 
datasets with large measurement uncertainties. 
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I. INTRODUCTION 



One considers generally two types of uncertainty, arising either from random errors or 
from systematic errors. In quantum computational chemistry, random uncertainties, such 
as those arising from non-zero convergence threshold, have been shown by Irikura et al- 
to be negligible. The major uncertainty sources are biases due to basis-set and/or theory 
limitations. For quantum chemistry to be predictive, i.e. to be able to predict observables 
with confidence intervals, these biases have to be corrected. A common way to do this is by 
semi-empirical corrections, i.e. corrections by additive or multiplicative factors calibrated 
on sets of experimental data.— - — 

Semi-empirical corrections of ab initio results by linear scaling are efficient for many ob- 
servables. It is often overlooked that semi-empirical corrections are statistical operations, 
and as such, accompanied by an uncertainty which has to be considered in the uncertainty 
budget of model predictions, of which it is liable to be a major contribution. A sound uncer- 
tainty budget for these corrections is important in many circumstances. For instance, it is 
acknowledged that ZPE is a major source of uncertainty in thermochemistry with chemical 
accuracy.- - - A good evaluation of ZPE prediction uncertainty is therefore essential for the 
assessment of the accuracy of computed thermochemical properties. In another field, in- 
frared spectral fingerprinting, confidence intervals on corrected vibrational frequencies could 
help to ascertain the identification of spectral features.— - — Estimation of uncertainty on 
computational chemistry results is also of paramount importance for their transfer in multi- 
scale chemical modeling.—^ As quantum computational chemistry is at the lowest scale of 
chemical simulation, uncertainty on its results has to be carefully propagated to the higher 
scales in order to get quantified predictions. An example is the use of computational ther- 
mochemistry to predict the rates of reactions that could have a direct impact on macroscopic 
observables in combustion simulations. 19 

The concept of Virtual Measurement has been introduced by Irikura et al., 1 with the aim 
to recast model outputs in the standardized uncertainty management framework established 
for experimental measurements in the Guide to the Expression of Uncertainty in Measure- 
ment (also known as "the GUM").— To be a Virtual Measurement, a model output has to 
be qualified by a standard uncertainty or confidence interval. 

In a recent article (hereafter IJKK09) , Irikura et al— address the problem of uncertainty 
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evaluation for scaled zero-point energies (ZPE), in the continuity of their 2005 paper (here- 
after IJK05) on vibrational frequencies. 22 Scaling of harmonic vibrational frequencies^ is an 
important example of semi-empirical correction method in computational chemistry, where 
estimation of a vibrational frequency v is obtained by multiplying the corresponding har- 
monic vibrational frequency u, routinely calculated by computational chemistry codes, by 
an empirical scaling factor s (Fig. [T]) 

V — us. (1) 

Optimal scaling factors have been computed for numerous sets of theory/basis-set combinations.- 

More sophisticated scaling schemes have been designed to increase the precision of semi- 
empirical corrections. They make use of frequency-range or mode adapted scaling factors 
for frequencies,— 1 ^ or internal coordinate adapted scaling factors for force constants.—"— In 
all cases, the scaling factors are optimized to reproduce at best a set of experimental data, 
and are affected by a calibration uncertainty, which depends on a few factors, as the size 
of the calibrartion set and the precision of the data within. We focus in the following on 
the importance of this calibration uncertainty and concentrate on the widely used uniform 
scaling factors (i.e. a single scaling factor for all frequencies), without loss of generality. 

In the majority of publications about scaling factors, two summary statistics are provided 
for each theory/basis-set combination: the optimal scaling factor and the root mean squares 
deviation, characterizing the average distance between experimental and corrected values 
estimated on the calibration dataset. From a reference dataset of experimental {vexp,i} = i 
and calculated {^i} 1=1 vibrational frequencies, the optimal scaling factor obtained by the 
least-squares procedure is 

s = 2J Uiis exPti / 2J w 2 (2) 
and the quality of the correction is estimated by the root mean squares (rms) value 




To our knowledge, these values have not been explicitly used for uncertainty propagation, 
but the rms 7 provides an estimate of the residual uncertainty resulting from the scaling 
correction ("something like the target accuracy",— or "a surrogate for uncertainty" according 
to Irikura et alM), and is used as a criterion for theory/basis-set selection. 

Acknowledging that scaling factors obtained by calibration on experimental datasets are 
uncertain, Irikura et al.—^- proposed that (i) this uncertainty is the major contribution to 



prediction uncertainty using the scaling model; and (ii) prediction uncertainty is propor- 
tional to the calculated harmonic property (frequency or ZPE). These authors argue also 
that scaling factors are accurate to only two significant figures, and that all other stud- 
ies overstate their precision by reporting them with four figures. This approach has been 
adopted by the National Institute of Standards and Technology (NIST) and put into prac- 
tice in the Computational Chemistry Comparison and Benchmark DataBase (CCCBDB), 27 
section XIII. C. 2, where scaling factors are provided with uncertainties derived according to 
the procedure of IJK05/IJKK09. These results can also have a direct impact on the criteria 
to define the best basis/method level of theory for a given observable. 

In the present paper, we revisit the problem of scaling factor calibration and properties 
prediction through the Bayesian Model Calibration framework, reputed for providing con- 
sistent uncertainty evaluation and propagation.—"— Section [IT] presents the methodological 
elements used for calibration and validation procedures, which are applied to a few repre- 
sentative vibrational frequency and zero point vibrational energy datasets and compared to 
the approach by IJK05 /IJKK09 in Section IIHI We point out a statistical inconsistency in 
this approach, the main consequence being a much too large scaling factor uncertainty, from 
which misleading conclusions can be derived. A set of recommendations for reliable uncer- 
tainty estimation of scaled properties is provided in the Conclusion. Bayesian calculations 
used in this study are fairly standard and straightforward, but for the sake of completeness 
and for readers unfamiliar with statistical modeling, detailed derivations are provided in the 
Appendix. 



II. METHODS 



In the following sections, we present the calibration procedure for uniform scaling factors 
of vibrational frequencies, but it can be easily transposed to any other property usually 
computed at the harmonic level and corrected by a multiplicative scaling factor (ZPE, en- 
tropy...). It is also straightforward to transpose this procedure to semi-empirical correction 
schemes involving multiple frequency-adapted scaling factors. 
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Figure 1. Correlation plot between calculated harmonic frequencies cjj and measured frequen- 
cies iA Xp for a set of vibrations extracted from the CCCBDB for the HF/6-31G* combination of 
theory/basis-set (dots). The full line is the regression line v exp = scj; the dashed line is a visual aid 
to appreciate the bias. 

A. Scaling factor calibration 

Considering a measured frequency v ex%n one can assume that it is related to the true or 
exact value v true by 

^exp Vtrue. ^exp (4) 

where e exp ~ N(0, u 2 exp ) is a normal random variable, centered at zero with standard uncer- 
tainty u exp , which represents the measurement error. 

Calculated harmonic vibrational frequencies u are also affected by random errors, related 
to numerical convergence defined by non-zero thresholds and the choice of starting point 
in geometry optimization, and to non-zero thresholds in wave-function optimization.- It 
has been shown that these uncertainties are negligible when compared to the measurement 
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uncertainty u exp r- In the following, one can thus assume that, for one choice of theory/basis- 
set, the harmonic vibrational frequencies are computed without significant uncertainty. 



1. The standard calibration model 

If one makes the hypothesis of a linear relationship between v true and u , as popularized 
by Pople at al., 2 the standard calibration model is 



where one considers a set of i = 1, N frequencies. For a single frequency, there is an optimal 
scaling parameter s, = v e xp,i/^i- As v exp ^ is uncertain, with standard uncertainty u exp ^ 
the value of Sj cannot be known exactly and has a standard uncertainty u Si = u exPi i/uJi. 
For a calibration dataset with uniform measurement uncertainty u exp , it can be shown that 
the optimal value for s is given by the least squares solution s, Eq. [21 and its standard 



Applicability of this formula is subject to one condition: the model (Eq. \5§ has to be 
statistically valid, which means that the residuals \y exp ^ — scOi}f =l should have a normal 
distribution centered on zero, with variance u 2 exp . Normality is not always verified; 25 but 
most important, the variance condition is violated in most cases where precise data are used 
for calibration: the linear model (Eq. [5]) is typically unable to reproduce a given set of 
measured frequencies within their measurement uncertainty. Consequently, the width of the 
distribution of residuals is dominated by model misfit instead of measurement uncertainty 
(7 ^> u exp ), which invalidates the distributional hypothesis of the standard calibration model 
(Eq. [5]). In these conditions, this model should not be used to infer u s , the uncertainty of s. 
Note that this is the key point to explain statistical inconsistencies in IJK05/IJKK09,— ^ 
as will be discussed later. 

2. An improved calibration model 

An option to solve this problem would be to search for better ab initio methods, able to 
reproduce experimental properties within their measurement uncertainties. This is an active 
research area which is out of the scope of the present study^^ 5 -"— . Considering the practical 



suji + e, 



(5) 



uncertainty by u s = u exp / y^2 i=1 uif (cf. Section Hi C 3j) . 
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interest of correction by scaling factors, we rather focus on restoring statistical consistency 
by improving the calibration model. 

Observing the apparent randomness of the residuals {v exp ^ — su{\^ =l (Fig. [2]), we con- 
sider that the model misfit is not deterministically predictable. A solution to preserve a 
statistically valid linear scaling model is to introduce an additional stochastic variable e mo d 
to represent the discrepancy between model and observations 

This equation is similar to the basic statistical model introduced by Kennedy and O'Hagan 32 
for Bayesian Calibration of Model Outputs. The discrepancy variable e mo d could formally 
depend on u, but we observed on representative datasets that the residuals between modeled 
and observed frequencies are not markedly frequency dependent (Fig. [2]).— Therefore e mod 
is considered null in average, with unknown variance M^ od : 

e m od ~ N{0,u 2 mod ). (7) 

The new calibration model (Eq. [6]) depends on two unknown parameters, s and u mo d^ 



B. Model predictions and uncertainty propagation 

The new stochastic prediction model used within the calibration model (Eq. [6]), 

v = su + e mod , (8) 

is linear with respect to uncertain variables s and e modi , and one can use standard uncertainty 
propagation rules^ to estimate the average value and variance of predicted frequencies: 

77= su (9) 

= u 2 u 2 s + u 2 mod , (11) 

where s denotes the average value of the scaling factor, and u 2 its variance. 

In order to provide evaluated predictions of vibrational frequencies, we need therefore to 
estimate s, u 2 and w^ orf from a calibration dataset. This is done in the next section, using 
Bayesian Model Calibration. 
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Figure 2. Residuals between calculated harmonic frequencies cjj and measured frequencies z/j for a 
set of vibrations extracted from the CCCBDB for the HF/6-31G* combination of theory /basis-set 
(dots). Bottom: residuals as a function of oj. In order to suppress the grouping effect linked with 
frequencies, the residuals were also plotted as a function of their order in the reference set (top). 



C. Bayesian Model Calibration (BMC) 



1. General case 



Starting from the calibration model (Eq. E]), one derives the expression for the poste- 
rior probability density function (pdf) of the parameters, given a set of N measured and 
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Sample Cumulative Density Function 

Figure 3. Plot of the cumulative density function (CDF) for the residuals (same as in Fig. [2]) against 
a normal CDF shows that globally there is very little deviation from normality in this dataset. 



calculated frequencies (details of derivation are provided in Appendix IA II) 



V ^mod| \j^exp,ii ^exp,ii } j— 



OC 



FT / 2 4- 2 

Umod 1 ii = i \l U mod + U exp,i 

exp(-^ { )7 ,i ~ S( i ) \ 1- (12) 

i=l ^ \ U mod + U exp,i) 



Estimates of s, u 2 s and w^ 0(i are obtained from this pdf. In the general case, this has to be 
done numerically.— Two limit cases of interest {i.e. negligible and very large measurement 
uncertainties), amenable to analytical results, are presented in the next sections. 
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2. The case of negligible measurement uncertainties 



In the commonly met situation where the amplitude of the discrepancy between cali- 
bration model and experimental data is much larger than any other sources of uncertainty 
{u mo d ^> u exp ), we can consider the approximate calibration model 

for which the posterior pdf (Eq. [T2|) can be simplified and rearranged to (see Appendix IA 21) 

( ix iM 1 ( ( ( g - g ) 2 ^f=i^A ( ^ 

V ( s,u mo d\ {vex P ,hUi\ i=1 \ oc exp I exp —r 2 , (14) 

V 7 U mod V ZU modJ \ AU mod J 

from which one can analytically derive estimates of the parameters: 

• s — s: the average value for s is identical to the optimal value of least-squares analysis 
(Eq. ED; 

• u s , the standard uncertainty on s, is related to the rms 7 by 

= 7 ^V/ [(W - 3) ; (15) 

• and the estimate of u 2 mod is related to 7 according to 



<od = YN/(N-3). (16) 
Inserting these values in Eq. (TTJ we obtain the standard uncertainty of a predicted frequency: 



N f uj 



,2 



u ^^i— 3 {zm + 1 i- (17) 



It can be seen that for large calibration sets of few hundreds of frequencies a/ ~N~/ (N — 3) ~ 1 
and u 2 / J2i ^ I5 an d thus 

u v ~ 7. (18) 

In such conditions, it is possible to derive directly confidence intervals on scaled properties 
from the summary calibration statistics s and 7 typically provided in the literature.— _ — 
Assuming the normality of uncertainty distributions, confidence intervals can be defined for 
prediction purpose, e.g. the 95% confidence interval for v is given by 

CI g5 {u) = [sou - 1.96 u v , su + 1.96 u v \. (19) 
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3. The case of very large measurement uncertainties 



When model discrepancy is negligible compared to measurement uncertainties {u mo( i <C 
u exp ), the standard linear model is statistically valid, and one recovers the Bayesian version 
of weighted least squares. The posterior pdf for s is then 

TV N ( 1 N i v — SUj) 2 \ 

p(s\ {v exp , u u exPtU Ui}^ =l ) oc JJu^exp - eXP '\ — » ( 20 ) 

i=l \ i=l U exp,i J 

from which one obtains 

TV N 

§ = Y1 {WexPs/ulxpt) I { U i/ U lxp,i} , ( 21 ) 
i=l i=l 
TV 

ti2 = l/5^(w?/«L P|i ). (22) 
For uniform experimental uncertainty over the dataset, the scaling factor uncertainty is 



N 



\ i=i 



D. The Multiplicative Uncertainty (MU) method 

Irikura e£ a/.—, after a thorough analysis of the uncertainty sources in the ab initio 
calculation of harmonic vibrational frequencies, proposed that the major contribution to 
prediction uncertainty would be the uncertainty on the scaling factor s. They estimate u s 
from the weighted variance of s with weights dj = uf. This weighting scheme is derived in 
two steps: (1) they propose that the probability density function (pdf) for the scaling factor 
is a linear combination of pdf's for individual scaling factors in the reference set; and (2) 
from the comparison of the expression of the average value derived from this proposition 
with the least-squares solution Eq. [2j This way, they obtain a standard uncertainty 



which can be related to the rms 7 by u* ~ 7 a/ N/ ^2 ^f- 

More recently, Irikura et al— derived another expression by standard uncertainty propa- 
gation from the least-squares solution Eq. &\ adding a new term to their previous expression 

1 1 \ 1/2 
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They showed that the contribution of the latter term is negligible, validating the use of 
their former expression. Note that, unless all frequencies Ui are equal, this uncertainty u* is 
different from the dispersion of s values within the calibration set 



and attributes larger weights to the high frequencies. 

Using either of Irikura et alM^& expressions for u*, uncertainty on a scaled frequency is 
approximated by 



hence the name of "Multiplicative Uncertainty" (MU) used hereafter. 

The salient feature of Eq. |27] is that prediction uncertainty is always proportional to 
the calculated harmonic frequency, ignoring the additive term present in Eq. [TTJ Simple 
statistical validation test of the MU method have apparently not been published and are 
performed in the next sections. 

III. APPLICATIONS AND DISCUSSION 

In the following, we validate the BMC approach and compare it to the MU approach on 
representative test cases of vibrational frequencies and zero point energies. 

A. Vibrational frequencies 

The reference dataset of 2737 frequencies for the HF/6-31G* combination of theory/basis- 
set has been downloaded from the NIST/CCCBDB in July 2008.-2I Correlation between 
experimental and harmonic frequencies is plotted in Fig. [TJ 

1. Calibration 

In absence of detailed information on the measurement uncertainties for this dataset, 
and considering the typical high accuracy of spectroscopic data, we assume that they are 
negligible and apply the corresponding equations for the BMC model. Using Eqs. [TS] and 
EU we obtain s = 0.89843 ± 0.00046, and u mod = 45.35 ± 0.61 cm' 1 (Tabled]). The latter 




1/2 



(26) 



U y ~ UJU* 



(27) 
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Summary stat. MU 



BMC 



s 7(cm 1 ) u* %CI 95 u s u rnod (cm x ) %CI 95 
All frequencies (N = 2737) 

Full set 0.89843 45.33 0.025 - 0.00046 45.35 

Calibration set 0.89860 45.27 0.024 - 0.00065 45.31 

Validation set - 83.0 - - 94.6 

High frequencies, between 3180 and 3500 cm 1 (N = 479) 
Full set 0.90502 28.71 0.00869 - 0.00040 28.78 

Calibration set 0.90517 23.32 0.01005 - 0.00046 23.44 
Validation set - - - 97.4 - - 95.4 

Table I. Statistical estimates and validation for MU and BMC models for vibrational frequencies 
extracted from the CCCBDB for the HF/6-31G* combination of theory/basis-set. 

value is very close to the rms value 7 = 45.33 cm -1 , which validates the use of Eq. [TBI for 
large calibration datasets. 

For this same dataset, the CCCBDB proposes s = 0.899 ± 0.025, which can be recov- 
ered using Eq. [24] (Table [I]). The standard uncertainties on s evaluated by both methods 
differ thus by a factor 50, which can be expected to have noticeable effect on prediction 
uncertainty (see Section [HI A 51) . In order to visualize the difference, we plotted the 95% 
confidence intervals on predicted frequencies obtained from both methods (Fig. Hj). It is 
immediately visible that the the MU approach has a tendency to underestimate uncertainty 
at low frequencies and to overestimate it at high frequencies. 

2. Validation 

To better quantify this inconsistency, we performed a standard test in statistical cali- 
bration/prediction: the dataset is split randomly in two subsets, one for calibration, the 
other one for validation. Both sets are taken here of equal size (plus or minus one unit). In 
this case, one gets slightly different values of the parameters, as reported in Table [H Using 
these values, we generate 95 % confidence intervals (Eq. [TH the residuals of this dataset 
have a nearly normal distribution) and calculate the percentage of inclusion of the exper- 
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Figure 4. Residuals of the linear scaling model for a set of 2737 vibrational frequencies and the HF/6- 
31G* combination of theory/basis-set (dots). Model 95% confidence intervals for residuals: dashed 
(green) lines for the Bayesian Model Calibration method; solid (red) lines for the Multiplicative 
Uncertainty model of IJK05/IJKK09. 

imental values of the validation subset within these prediction intervals (Fig. H]). For a 
consistent predictor, one should find a frequency close to 95%. BMC succeeds for 94.7% 
of the frequencies in the validation set, whereas the MU model succeeds for only 83% (Ta- 
ble HJ). Considering the size of the samples, the difference is significant, and the statistical 
consistency of the MU approach can be questioned. When contrasted with the BMC, one 
understands that the MU method, which does not consider model inadequacy explicitly, 
"absorbs" it at least partially in u*. 

3. Test on a restricted frequency scale 

As stated in IJK05, "to apply the fractional bias correction, it is important to select a 
class of frequencies similar to the ones to be estimated".— For instance, if one selects in the 
reference set only those frequencies between 3180 and 3500 cm -1 , one gets a much more 
uniform picture than with the full reference set. 

The MU calibration procedure was done with this limited set of 479 frequencies, providing 
s = 0.9050±0.0087 (Table[T|). In this case, the uncertainty factor for s is practically identical 
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to the standard deviation calculated from the sample (0.00869 vs. 0.00871): u* ~ S s . Due 
to the restricted frequency range, one has indeed uf/'Y^ i ujf ~ 1/N , hence the identity 
between evaluations by Eqs [2H and |26j 

This set has been split in two, as before. The scaling factor obtained by MU from the 
calibration subset is now s = 0.9052 ± 0.0071, and 97.4% of the validation frequencies fall 
within the 95 % confidence interval. This result is quite close to the one obtained with BMC 
(Table U). 

It appears thus that in restrictive conditions, the MU method can be valid for reference 
sets where the individual scaling factors are uniformly distributed with regard to the har- 
monic frequencies. In such cases however, the uncertainty is recovered as the conventional 
unweighted standard deviation of the sample of individual scaling factors (Eq. |26|) . Note 
also that the MU method is used in the CCCBDB out of these favorable conditions.— 

4- Significant figures and uncertainty reporting 

Good practice in uncertainty reporting is to provide one or two significant figures for the 
uncertainty and to truncate the average/ optimal value at the same level.— If the reported 
number is to be used in further calculations (which is the case for uncertainty propagation), 
two digits is better. The common practice is to publish scaling factors for vibrational 
frequencies with four significant digits.— _ — At the risk of being pedantic, one could argue 
that they should be reported with five significant digits, e.g. s = 0.89843 ±0.00046, in sharp 
contrast with the two digits recommendation of Irikura et al.,— based on their biased scaling 
factor uncertainty evaluation. 

5. Prediction and uncertainty propagation 

The relative importance of both factors u 2 s and u 2 mod in Eq. [TT] can be evaluated on the 
example of a calculated harmonic frequency in the higher frequency range u = 3000 cm -1 
(Table E]). 

In this case, the uncertainty on the scaling factor contributes only to one thousandth 
of the total prediction variance. When dealing with large datasets of accurate vibrational 
frequencies, the uncertainty on the scaling factor can thus be neglected. The uncertainty 
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Property Theory/Basis set 




Method 


ui 2 u 2 


i, 2 

a mod 


v db u v 




Frequency HF/6-31G* 


3000 cm" 1 


BMC 


2.25 


2052.09 2695±45 cm" 1 








MU 


5625.0 




2695±75 cm" 1 




ZPE HF/6-31G* 


100 kJ moF 


1 BMC 


0.073 


0.53 


91.35±0.78 kJ moP 


-l 






MU 


2.592 




91.35±1.61 kJ moP 


-i 


ZPE B3LYP/6-31G* 


100 kJ moP 


1 BMC 


0.029 


0.19 


98.12±0.47 kJ moP 


-i 






MU 


1.061 




98.12±1.03 kJ moP 


-i 



Table II. Compared prediction uncertainty with the BMC and MU methods for a set of 2737 vibra- 
tional frequencies extracted from the CCCBDB for the HF/6-31G* combination of theory /basis-set 
and for a set of 39 ZPE of the Zl set for the HF/6-31G* and B3LYP/6-31G* combinations. 

on u mo d is also much too small to be relevant for confidence intervals calculation. One can 
therefore safely apply the uncertainty propagation formula (Eq. [TSj) . using the rms provided 
by most reference articles dealing with scaling factors calibration.—"— For smaller calibration 
sets, the rms can be seen as an inferior limit to prediction uncertainty, and Eq. [T7| would 
provide more reliable confidence intervals (see next Section). 

Comparing the prediction uncertainties for the BMC (45 cm -1 ) and MU (75 cm -1 ) meth- 
ods, one sees that the factor 50 between u s and u* observed at the calibration stage is 
partially damped at the prediction level by the fact that the BMC uncertainty is strongly 
dominated by the model inadequacy parameter u mo d- 



B. Zero Point Vibrational Energies 

We consider ZPE as an additional test because the reference datasets are considerably 
smaller than for the vibrational frequencies (e.g. 39 molecules in the Zl set of Merrick et 
alM), which is expected to emphasize the role of u s , the uncertainty on the scaling factor. 
The uncertainties reported by Irikura42 for diatomic molecules are typically very small (on 
the order of 0.01 cm" 1 ), but transposition to larger molecules is not straightforward. In 
the absence of a systematic review of measurement errors for ZPE of polyatomic molecules, 
we consider here that they can be neglected. The effect of non-negligible measurement 
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uncertainties is addressed at the end of this section. 

1. Calibration - Validation 

Using BMC with the Zl reference set, one gets s = 0.9135 ± 0.0027 and u mo d = 0.731 ± 
0.086 kJ mol -1 (Table IITT]) . which is consistent with the rms obtained by Merrick et alM for 
the HF/6-31G* theory/basis-set combination. Relative uncertainties on these parameters 
have been increased by one order of magnitude, when compared to the vibrational frequencies 
case, a direct effect of the smaller sample size. The validation test shows once more that 
the MU model fails to provide correct confidence intervals, with a score of only 0.63 for CI95 
(Table EI). 

2. Uncertainty propagation 

For such a small reference dataset, it is interesting to check if the approximate formula 
(Eq. [TB]) for uncertainty propagation, which was validated for large sets of vibrational 
frequencies still holds, i.e. if the contribution of the multiplicative term involving u s stays 
negligible or not for the larger ZPE values. If one considers a calculated ZPE of 100 kJ mol -1 
(HF/6-31G*), one has u v = ^(100 * 0.0027) 2 + 0.73 2 = 0.78 kJ mol -1 , to be compared to 
7 = 0.71 kJ mol -1 (Table IHIj) . It is to be noted also that the uncertainty on u mo d might also 
contribute, with u Umod = 0.09 kJ mol -1 . Taking all uncertainty sources into account trough 
Eq. IA27l by Monte Carlo Uncertainty Propagation (MCUP),— one gets u v = 0.77 kJ mol -1 . 
The uncertainty on u mo d can therefore be neglected. 

In the same conditions, for the combination B3LYP/6-31G*, one gets 7 = 0.42 kJ mol -1 
and u v = 0.47 kJ mol -1 , to be compared with a reference value obtained by MCUP of 
u v = 0.47 kJ mol -1 fTable lffl]) . 

There is globally only a 10% increase compared to the rms 7. In this range of ZPEs, 
7 still provides a good approximation of the prediction uncertainty (Table HB. However, 
the amplitude of the discrepancy between 7 and u v will probably increase with the size of 
the molecule. In consequence, for uncertainty propagation with ZPEs, notably for large 
molecules, it would be safer to use the full UP formula (Eq. [17)1 . involving the multiplicative 
uncertainty factor. Compilations of scaling factors for ZPE should thus report the easily 
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Summary stat. MU 



BMC 



s 7(kJ mol x ) u* %CIg5 u s ti mo ^(kJ mol 1 ) %CIg5 

HF/6-31G* 

Full set 0.9135 
Calibration set 0.9078 
Validation set 

B3LYP/6-31G* 

Full set 0.9812 0.423 0.0103 - 0.0017 0.437±0.052 

Calibration set 0.9825 0.448 0.0134 0.0032 0.478±0.083 

Validation set 0.63 1.00 

Table III. Statistical estimates and validation for MU and BMC models for a set of 39 ZPVEs of 
the Zl set for the HF/6-31G* and B2LYP/6-31G* combinations of theory /basis-set. 

calculated value of u s = 7 yWj ((N — 3) ^ujf), in addition to the rms 7. 

3. The case of non-negligible experimental uncertainties 

When the measurement uncertainty becomes comparable to the rms, model inadequacy 
should be small, and confidence intervals for prediction should account for the measure- 
ment uncertainty (Eq. [TTT) . In the absence of an exhaustive compilation of experimental 
uncertainties on measured ZPE, we performed simulations assuming a uniform uncertainty 
distribution over the full dataset. In order to test the sensitivity of the model parameters 
to this uncertainty, we repeated the estimations of the previous section, using Eq. IA6| for 
values of u exp between 0.1 and 1.0 kJ mol -1 . The results are reported in Fig. 

As expected from the properties of the posterior pdf (Eq. [T2"l) . the average/optimal value 
of the scaling factor is not sensitive to the amplitude of u exp . Moreover, we observe only a 
slight absolute increase of u s from 0.002 to 0.004. A transition from a constant u s , defined 
by the u exp = limit, to a linear increase consistent with the weighted least squares limit 
(Eq. |2"3"|) is observed around u exp = 7, where both limit equations intersect. A closer look 
shows that the transition occurs indeed at values of u exp slightly smaller than 7, in a region 
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0.707 0.0161 - 0.0027 0.731±0.086 
0.773 0.0214 0.0052 0.826±0.143 

0.63 0.95 



(u exp ~ 0.35) where u s displays a minimum. 

The evolution of the model inadequacy factor u mo d is more dramatic: it displays a sharp 
decrease and falls down to zero as soon as the measurement uncertainty reaches the value of 
the rms 7. For values of u exp below 0.25, u mo d follows the i^ mod + = 7 2 law (represented 
as a dashed line in Fig. |5]), but the calculated decrease becomes much faster in the transition 
zone. The uncertainty on u mo d increases notably in the transition region. 

In the limit of large experimental uncertainties, using Eq. [231 the uncertainty propagation 
formula can be written as 



In this case, the model inadequacy variable e mo d becomes useless, as the standard calibration 
model is statistically valid. 

This test shows that the BMC model is able to adapt nicely to various conditions 
of measurement uncertainty, with an automatic and smooth transition from the "model 
inadequacy"- to the "measurement uncertainty'-dominated regimes. 

IV. CONCLUSIONS AND RECOMMENDATIONS 

A reanalysis of the scaling factor calibration problem as stated by Irikura et al.—^ iden- 
tified two uncertainty components, besides the experimental one: a parametric uncertainty 
u s attached to the optimal scaling factor, and a model inadequacy factor u mo d accounting 
for the inability of the linear scaling correction model to reproduce sets of calibration data 
within their experimental uncertainties. A general estimation framework, based on Bayesian 
Model Calibration, has been defined and validated in cases of interest. 

The general formula for prediction of a scaled property v from a harmonic value u> is 



where s is the optimal value of the scaling factor provided by the least squares formula (Eq. 
[2]) for negligible or uniform measurement uncertainty, or more generally by the weighted least 
squares formula (Eq. |2T|) . and u v is a standard uncertainty, for which explicit expressions 
have been derived in limit cases, depending on the size and precison of the calibration set: 

• large calibration sets of precise data {u exp <C 7): u v {uj) = 7; 




(28) 



V = stu i u 



(29) 
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Figure 5. Evolution of measurement model parameters with the amplitude of an hypotheti- 
cal uniform experimental measurement uncertainty u exp on ZPE; B3LYP/6-31G* combination of 
theory/basis-set (green squares with error bars). The brown vertical dashed line indicates the value 
of the rms 7. Top panel: the red dashed lines represent the la confidence interval in the limit 
of null experimental uncertainty; the blue dashed line represent the la confidence interval in the 
weighted least squares limit. Bottom panel: the red dashed line represents the ii^ od + u 2 xp = 
law, truncated to positive values of u mo( i- 



small calibration sets of precise data (u exp 7): u v (oS) = 7 



N 
N-3 



i 1 1 1 



• calibration sets with large measurement uncertainties (u exp > 7): u u (co) = to/ l u2 
simplified to u u (u) = u exp uo / for uniform measurement uncertainty. 

The Multiplicative Uncertainty method proposed by Irikura et alM-^Z has been shown here 
to be statistically inconsistent when large frequency ranges are considered. It is only valid in 
particular situations, either when the dataset spans a restricted frequency range (in which 
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case the uncertainty is reduced to a trivial unweighted standard deviation), or in the extreme 
case of large uniform measurement uncertainty in the calibration dataset. For vibrational 
frequencies, the MU method underestimates prediction uncertainty for small values of oj and 
overestimate it (up to a factor 2) at the high end of the u scale. 

We would like to stress out that the validity of the formulas proposed above for uncer- 
tainty propagation depends to some extent on the normality of the residuals {z/, — suii}f =1 of 
the linear regression. Inspection of histograms of residuals (see e.g. Fig. 1 in Ref.— ) shows 
that this is not always the case. The usual approach of choosing an optimal theory/basis-set 
combination is to assess their performance by the rms alone, maybe weighted by computa- 
tional cost considerations.— - — Researchers concerned by prediction uncertainty might also 
consider an additional "normality criterion" in order to reject theory/basis-set combinations 
providing non-normal residuals and from which the summary statistics cannot be used reli- 
ably for uncertainty propagation. Analysis of restricted ranges of data as presently done by 
some authors for vibrational frequencies^ 1 ^ is one way to improve the normality of residuals, 
but as demonstrated above, prediction from small calibration sets calls for more information 
than the rms. 

A. Recommendations to calibrators of scaling factors 

1. For large calibration sets of accurate data, as the ones used for calibration of uniform 
scaling factors for vibrational frequencies, reliable prediction uncertainty can be simply 
based on the rms 7 (Eq. [3]). In this case, prediction uncertainty is purely additive. 

2. For much smaller datasets of a few dozens of data or less, as in the case of ZPEs 
or mode-specific frequencies, one has a combination of additive and multiplica- 
tive uncertainty (or rather, variance). Ideally, uncertainty on the scaling factor 
u s = 7 a/ N/ ((N — 3) ^2 oof) should be reported along with the additive term u mo d = 
7 a/A/ (N — 3), for use in the general uncertainty propagation equation u 2 = u 2 u 2 + 
u mod (Eq- fTTj) . It certainly would be a large step towards the general applicability of 
the Virtual Measurement concept,- if statistically pertinent estimators were systemat- 
ically reported in the literature devoted to the calibration of semi-empirical correction 
parameters. 
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3. An indicator of the normality of the residuals in the calibration dataset would also be 
welcomed. 

B. Recommendations to users of scaling factors 

1. For the end user of scaling factors, it is important to remind, as pertinently stated 
by Irikura et al^-^-, that semi-empirical correction of a property by scaling is not a 
deterministic procedure: a scaled property has an attached uncertainty, which depends 
on the level of theory/basis-set used for the calculation of harmonic properties (it 
depends also on the quantity and quality of the calibration dataset, but this is out of 
reach of the end user). 

2. In the present state of affairs, the best estimate of the prediction uncertainty available 
for most levels of theory/basis-set is the rms 7, i.e. one has to assume u v ~ 7.— ~— 
The use of the multiplicative scaling factor uncertainty as reported presently (March 
2010) in the CCCBDB 27 cannot be recommended for the estimation of uncertainty of 
scaled properties. 

3. Users are encouraged to 

(a) report the uncertainty along with the scaled properties, i.e. v = su ± u u , and 

(b) account for uncertainty when scaled properties are used as inputs to a model2&^, 
or for comparison with experimental data. 

4. One has to be conscious that 7 provides only a lower limit of the uncertainty for 
properties with small calibration data sets (e.g. ZPE). For numerical examples, see 
Table [TT1 
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Appendix A: Appendix 



1. Bayesian analysis of scaling factor calibration model 

We consider the calibration model 

^exp,i SUJi ~\- dmod ~\~ ^exp,ii (-^--Q 

where 

5 u exp,i) i s the measurement uncertainty of v e xp,i, and e mo d ~ A^(0, , u 2 mod ) 
is a variable accounting for the discrepancy between the linear model and the observations. 
This model has two unknown parameters, s and u mo d, to be estimated on a calibration 
dataset consisting of N calculated harmonic frequencies {u>i}f =1 , and their corresponding 
experimental frequencies {v e x P ,i, Uex P ,i}f=v 

In the Bayesian data analysis framework,—^ all information about parameters can be 
obtained from the joint posterior pdf p (s,u mo d\ {vexp,%-, u eX p,i, ^^iLij- 111 order to simplify 
the notations, we will omit in the following the list indices when they are not necessary. 
This pdf is obtained through Bayes theorem 

p(s | {Vexp, Uexp, ^}) OC p({v exp } \s u})p(s,u mod ), (A2) 

where p({v e xp} \s,u m0( i, {u exp ,u}) is the likelihood function and p(s,u mo d) is the prior pdf. 

In the hypothesis where the difference between observation and corrected frequency is 
expected to arise from a normal distribution 

v exP)i - sui ~ JV(0, u 2 mod + u 2 expi ), (A3) 

the likelihood function for a single observed frequency is 

P^exp,i\ S , U mod,U expji ,Ui) = (2tT + ) " 1/2 exp | P,t ^ | . (A4) 

\ A U mod ' U exp,i J 

Considering that all frequencies are measured independently (with uncorrelated uncertainty) 
the joint likelihood is the product of the individual ones, i.e. 

N 

-1/2 

^71 {u mod i- u exp i )) 

i=l 



V {{vexp} \s, u mod) {u exp , u}) = Yl (2vr (u 2 mod + u 2 expi )) 1/2 x 



N 

/ 1 ' 

exp 



1 V"^ {Vexp.i - SUJif \ 



^ f=1 U mod + U exp,i 
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As there is a priori no correlation between s and w mo d, we use a factorized prior pdf 
p{s,u mo( i) = p{s)p(u mo d) ■ In the absence of a priori quantified information on s, a uni- 
form pdf p(s) = cte is used. For u mo d, we enforce a positivity constraint through a Jeffrey's 
prior, p(u moc i) oc u^ od .— The posterior pdf is finally defined up to a norm factor which is 
irrelevant for the following developments 

N 

1/2 



p(s, ^modl \yexpi & S ^exp}) OC U mod J^J {^mod ^exp,i) 



exp 



x 

8=1 

1 \- {Vexp,i ~ -S^i) 2 \ / a n\ 

^ i= i a mod ~ u exp,i J 



2. Case of negligible measurement uncertainties 

For the analysis of vibrational frequencies, it is generally considered that experimental 
uncertainties are negligible when compared to model inadequacy {u exPt i <C u mo d). The 
general expression for the posterior pdf (Eq. IA6I) can then be simplified accordingly: 

P{s, U mod \ {v exp , UJ}) OC U m ^ d 1 exp -— -2 ^ ( V exp,i - surf . (A7) 

V ZU mod i=1 J 

Using Eq. [2] and [3] we derive the identity (see e.g. Ref. 33 (, Eq. 9.4, p. 214)) 

JV 

(vexp,i - su t ) 2 = (s- s) 2 Y,^ + iV7 2 , (A8) 

i=l 

which enables to write the posterior pdf in a convenient factorized form 

p(s, u mod \ {v expi u}) oc u^ N od ~ l exp (-^L\ exp f- ^ S \ (A9) 

V ZU modJ V ZU mod J 

from which we can derive analytical estimates for the parameters and their uncertainties. 



a. Estimation of s 

The marginal density for s is obtained by integration over u mo d 
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POO 

p(s\ {i>ex P , w}) = / du mod p(s,u mod \ {lS exp ,Uj}) (A10) 
JO 

f°° ( 1 r 2 \ 

oc / du mod u^ od ~ x exp -— -2 — >J (^xp.i - suji) 2 (All) 

oc (z/ ex . p ,i - swi) 2 , (A12) 



. i=l 

which, using Eq. \A8\ can be rewritten as 



I ~\2 2\ -iV/2 

y.(.s| {!/«„, w}) oc ( 1 + {a ~ 8 ^t Ui ) ' ( Al3 ) 



and has the shape of a Student's distribution^ 

„2\ -(n+l)/2 



Stt(x) oc ( 1 + — ) . (A14) 



n 

Posing n = N — 1 and x 2 = (N — 1)/N (s — s) 2 uf/j 2 , we can directly use the properties 
of the Student's distribution 

E[x] = 0; Var[x] = n/(n - 2), (A15) 

to derive 

E[s] = s = s (A16) 

Var[s] = u 2 = 7 2 — _ ^ - ^ 2 Var[s] (A17) 

= 72 w4l>F (A18) 

6. Estimation of u mod 

The marginal density for the standard uncertainty of the stochastic variable e mod is 

/oo 

dsp(s,u mod \ {u exp ,uj}) (A19) 
-oo 

K i exp r ds exp f _ <£^IH) (A20) 



jv+i i 2 2 / / I 2 M 2 

U mod V Z ' a modJ J -oo V ZU mod 



2 



"mod V z ' a mod 
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Using the formula 



J™ dx x- n e- a ' x2 = /a(n " lV2 (A22) 



to recover the normalization constant of p(u mo d\ {v e xp,^}) and to calculate mean values of 
u mo d and u 2 mod ^ one obtains readily the following estimates 

Umod = 7 (A23) 
[NT\(N -2)/2] , k x 

^=^H nN -m] < A24 > 

N 



<od = ]^7 2 (A25) 



N N fr [(N — 2)/2]\ 
N~^3 ~~ ~2 \T[{N- 1)/2]J ' 



(A26) 



3. Prediction and uncertainty propagation 

In the Bayesian framework, the posterior pdf p(s, u mo d\ {v e xp, u exp , u>}) can be used to 
estimate the uncertainty of predicted frequencies by the law of propagation of distribution^ 

p W, «W. «}> = / ds AwM., -Op »-l "W «}) • (A27) 
where 

7T~2 ( A28 ) 

AU mod J 

translates the stochastic prediction model (Eq. [8]) as a pdf . This integral has generally to 
be evaluated numerically. 
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